System and method for fetching an information unit

ABSTRACT

A device and a method for fetching an information unit, the method includes: receiving a request to execute a write through cacheable operation of the information unit; emptying a fetch unit from data, wherein the fetch unit is connected to a cache module and to a high level memory unit; determining, when the fetch unit is empty, whether the cache module stores an older version of the information unit; and selectively writing the information unit to the cache module in response to the cache module in response to the determination.

FIELD OF THE INVENTION

The present invention relates to methods and devices for fetching aninformation unit, and especially to methods and devices for retrievingan information unit by cache module that supports speculative fetch andwrite through policy.

BACKGROUND OF THE INVENTION

Cache modules are high-speed memories that facilitate fast retrieval ofinformation including data and instructions. Typically, cache modulesare relatively expensive and are characterized by a small size,especially in comparison to higher-level memory modules.

The performance of modern processor-based systems usually depends uponthe cache module performances and especially to a relationship betweencache hits and cache misses. A cache hit occurs when an information unitthat is present in a cache module memory is requested. A cache missoccurs when the requested information unit is not present in the cachemodule and has to be fetched from another memory that is termed ahigher-level memory module.

Various cache module and processor architectures, as well as dataretrieval schemes, were developed over the years, to meet increasingperformance demands. These architectures included multi-port cachemodules, multi-level cache module architecture, super scalar typeprocessors and the like.

The following U.S. patents and U.S. patent applications, all beingincorporated herein by reference, provide a brief summary of some stateof the art cache modules and data fetch methods: U.S. Pat. No. 4,853,846of Johnson et al., titled “Bus expander with logic for virtualizingsingle cache control into dual channels with separate directories andprefetch for different processors”; U.S. patent application 20020069326of Richardson et al., titled “Pipelines non-blocking level two cachesystem with inherent transaction collision-avoidance”; U.S. Pat. No.5,742,790 of Kawasaki titled “Detection circuit for identical andsimultaneous access in a parallel processor system with a multi-waymulti-port cache”; U.S. Pat. No. 6,081,873 of Hetherington et al.,titled “In-line bank conflict detection and resolution in a multi-portednon-blocking cache”; and U.S. Pat. No. 6,272,597 of Fu et al., titled“Dual-ported, pipelined, two level cache system”.

Processors and other information requesting components are capable ofrequesting information from a cache module and, alternatively oradditionally, from another memory module that can be a higher-levelmemory module. The higher-level memory module can also be a cachememory, another internal memory and even an external memory.

There are various manners to write information into a cache module or ahigher-level memory module. Write-through involves writing one or moreinformation units to the cache module and to the higher-level memorymodule substantially simultaneously.

Some prior art cache modules include multiple lines that in turn arepartitioned to segments. Each segment is associated with a validity bitand a dirty bit. A valid bit indicates whether a certain segmentincludes valid information. The dirty bit indicates if the segmentincludes valid information that was previously updated but not sent tothe higher-level memory module. If a write back policy is implementedonly the segments that are associated with an asserted dirty bit arewritten to the high-level memory module.

Some prior art cache modules perform mandatory fetch operations andspeculative fetch operations. The latter are also known as pre-fetchoperations. A mandatory fetch operation involves fetching an informationunit that caused a cache miss. The speculative fetch operations areaimed to reduce cache miss events, and replace not-valid segments withvalid segments.

When applying both speculative fetch operations and write-through policythe high-level memory module can replace an updated segment residing inthe cache memory with a non-updated segment. This can cause a coherencyproblem.

The following U.S. patents and patent applications illustrate variousdevices and systems for solving coherency problems: U.S. Pat. Nos.6,574,714, 6,662,275, 6,021,468 and 6,374,330 of Arimilli et al; U.S.Pat. No. 6,868,482 of Mackenthum et al.; U.S. Pat. No. 6,249,520 ofSteely et al.; U.S. Pat. No. 5,953,538 of Duncan; U.S. Pat. No.6,233,656 of; and U.S. Pat. No. 6,848,030 of.

There is a need to provide an efficient method and device for fetchinginformation to a cache module.

SUMMARY OF THE PRESENT INVENTION

Method and system for fetching an information unit, as described in theaccompanying claims.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention will be understood and appreciated more fully fromthe following detailed description taken in conjunction with thedrawings in which:

FIG. 1 is a schematic diagram of a device, according to an embodiment ofthe invention;

FIG. 2 is a schematic diagram of a sub-system, according to anembodiment of the invention;

FIG. 3 is a schematic illustration of a data cache module, according toan embodiment of the invention;

FIG. 4 is a schematic illustration of cache logic, according to anembodiment of the invention;

FIG. 5 is a schematic illustration of a structure of the data cachemodule, according to an embodiment of the invention;

FIG. 6 is a detailed description of a data channel, according to anembodiment of the invention;

FIG. 7 is a schematic illustration of a device, according to anembodiment of the invention;

FIG. 8 is a flow chart of a method for fetching an information unit,according to an embodiment of the invention; and

FIG. 9 is a flow chart of a method for fetching an information unit,according to another embodiment of the invention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

The following description related to data fetch operations and to a datacache module.

The device is adapted to delay a cache miss/hit determination until afetch unit is empty (does not store data), thus solving a possiblecoherency problem. In this case even if an older version of theinformation unit is pre-fetched after the processor provides a newerversion of that information unit, the delayed hit/miss check will detectthe older version of the data unit and the cache module will overwritethis older version.

FIG. 1 illustrates a device 10 according to an embodiment of theinvention. Device 10 includes a sub-system 100 that in turn includes afirst requesting component such as first processor 110 and also includesa multi-port data cache module (denoted 200 in FIG. 2). Device 10further includes a system bus 60 that is connected to: (i) a secondrequesting entity such as second processor 20, (ii) high-level memorymodule 50, (iii) sub-system 100, (iv) peripherals 70, and (v) anexternal system I/F 80.

The high-level memory module 50 is an example of another memory modulethat is accessible by processor 110. It usually stores programs and datafor the various processors. It can also be a second level cache memorymodule supporting off-chip memories, but this is not necessarily so. Ifa cache miss occurs the data can be fetched from the high-level memorymodule 50 or from other memory modules.

System bus 60 is connected to sub-system 100, via a gasket (alsoreferred to as interface) 380. Various fetch operation utilize interface380.

Device 10 also includes a DMA system bus 90 that is connected to a DMAcontroller 30, to multiple peripherals 40 and to the shared memorymodule 370, via DMA interface 382. The DMA system bus 90 can be used byexternal components, such as processor 20 to access the shared memorymodule 370.

FIG. 2 illustrates a sub-system 100 of device 10, according to anembodiment of the invention. Sub-system 100 includes a processor 110,data channel 130, Memory Management Unit (MMU) 300, instruction channel340, level-one RAM memory 370 as well as interface unit 380.

Processor 110 and the instruction channel 340 are connected to programbus 120. Instruction channel 340 includes an instruction cache and aninstruction fetch unit. MMU 300 is adapted to translate virtual tophysical addresses as well as generate various cache and bus controlsignals.

Processor 110 includes first data port 116 and second data port 118. Thefirst data port 116 is connected, via first data bus (XA) 122 to a firstport 132 of the channel 130, to MMU 300 and to the level-one RAM memory370. The second data port 118 is connected, via second data bus (XB) 124to a second port 134 of the data channel 130, to MMU 300 and to thelevel-one RAM memory 370. Processor 110 is capable of generating twodata addresses per cycle.

The data channel 130 is connected, via data fetch bus 126, to aninterface 380 that in turn is connected to one or more additionalmemories such as the high-level memory 50. Additional memories can be apart of a multi-level cache architecture, whereas the data cache module200 is the first level cache module and the other memories are level twocache memories. They can also be a part of an external memory that isalso referred to as a main memory.

Data channel 130 includes a data cache module 200, and multiplesupporting units such as Write Through Buffer (WTB 155), a Data FetchUnit (DFU) 170, optional Write Back Buffer (WBB) 180 and Data ControlUnit (DCU) 150. DFU 170 is responsible for data fetching andpre-fetching. Data fetching operations can include mandatory fetchingoperations and speculated fetching operations. Mandatory fetchingoperations include retrieving a data unit that caused a cache miss.Speculated fetching (also termed pre-fetching) operations includeretrieving data units that did not cause a cache miss. Usually thislatter type of data is expected to be used soon after the pre-fetch.This expectation is usually based on an assumption that many datarequests are sequential in nature.

It is assumed that each fetch operation involves fetching a single basicdata unit (BDU). Accordingly, a BDU that is fetched during a mandatoryfetch operation is referred to as a mandatory BDU and a BDU that isfetched during a speculated fetch operation is referred to as aspeculated BDU. It is further noted that the size of BDU can depend uponthe memory module from which it is initially fetched, but for simplicityof explanation it is assumed that all the BDUs have the same size.

WBB 180 temporarily saves data written into the main memory in awrite-back operation. Write back operation occurs when data that waspreviously written into the data cache module 200 is replaced.

Processor 110 is capable of issuing two data requests simultaneously,via buses XA 122 and XB 124. The data channel 130 processes theserequests to determine if one or more cache hit occurred. Basically, thedata channel 130 can decide that the two data requests resulted in acache hit, the both request resulted in a cache miss or that one requestresulted in a cache hit while the other resulted in a cache miss.

According to an embodiment of the invention processor 110 is stalleduntil it receives all the data it requested, but this is not necessarilyso. For example, according to another embodiment of the invention, onlyportions of the processor are stalled.

There are various manners for starting and ending the stalling stage. Acache miss can trigger entrance to such a stage. It is assumed thatprocessor 110 enters a stalled stage once it receives a cache missindication from data channel 130. Processor 110 exits the stall stageonce it receives an indication from the data channel 130 that therequested data is available. Line 302, connecting between processor 110and data channel 130 conveys a stall signal that can cause processor 110to enter a stalled stage and exit such a stage.

FIG. 3 is a schematic illustration of data cache module 200, accordingto an embodiment of the invention. Data cache module 200 includes logic,such as cache logic 210 and cache memory bank 250. The cache memory bank250 includes one hundred and twenty-eight lines 250(0)-250(127), eachline includes sixteen 128-bit long basic data units. These basic dataunits (BDUs) are denoted 252(0,0)-252(127,15). A cache hit or cache missis determined on a BDU basis. It is noted that the logic can be locatedoutside the cache module, but this is not necessarily so.

Cache logic 210 also receives a fetch unit empty indication 172 thatindicates that the fetch unit 170 is empty. This indicates that thereare no more information units that are being fetched by a speculativefetch operation. If an older version of an information unit waspre-fetched after a more updated version of the information unit isprovided by processor 110, the cache logic 210 will perform a hit/misscheck after the older version is written to the cache module and willoverwrite the older version.

FIG. 4 is a schematic illustration of a portion 212 of cache logic 210,according to an embodiment of the invention. Cache logic 210 is capableof managing two data requests simultaneously and includes two identicalportions, 212 and 214, each is capable of determining whether a singlecache hit or cache miss has occurred. For simplicity of explanation onlya first portion 212 of the cache logic 210 is illustrated in detail.

Cache logic 210 receives fetch unit empty indication 172 and can delay ahit/miss determination until receiving such indication. The delay caninclude delaying a comparison by various comparators 230(0)-230(7), theretrieval of validity bits and can also delay the output of HIT signalby logic gate 246. This delay can be implemented by timing controller242.

According to an embodiment of the invention cache logic 210 alsoincludes a predefined address comparator 244 that can compare theaddress of the information unit (or a portion of said address) to apredefined range of addresses to determine whether the delaying of thehit/miss decision is required. Predefined address comparator 244 cansend an appropriate disable/enable signal to timing controller 242.

The cache logic 210 includes eight ways denoted WAY0-WAY7 220(0)-220(7).Each way stores address and status information that is associated withsixteen lines. The address information includes a tag address and thestatus information includes BDU validity and update information. Forsimplicity of information only WAY0 220(0) is illustrated in detail,while the other ways are represented by boxes 220(1)-220(7).

Each line is associated with an extended tag value and with sixteen BDUvalidity bits, representative of a validity of each BUD within thatline. WAY0 220 stores sixteen extended tag addresses 220(0)-220(15), aswell as sixteen sets of sixteen BDU validity flags 220(0,0)-220(15,15).

Each BDU can also be associated with dirty bits that indicate if a BDUwas modified without being written to the higher-level memory module.

Once processor 110 provides a address 400 over the first data bus XA 122the first portion 212 of cache logic 210 processes this address todetermine whether the requested data is stored at the cache module(cache hit) or not (cache miss). If a cache hit occurs the requesteddata is sent to processor 110 over an appropriate data bus out of XA 122or XB 124. Else, the DFU 170 is notified about the cache miss.

Address 400 is partitioned to a 20-bit tag address 402 that includes thetwenty most significant bits of address 400, a 4-bit line index 404, aBDU offset 405 and a 4-bit byte offset 408. The 4-bit byte offset isused for data retrieval from the cache memory bank 250. The cache module200 can be addressed by virtual addresses, while the higher-level memorymodule is accessed by physical addresses. Accordingly, the MMU 300performs address translation only when BDUs are fetched from thehigh-level memory module 50.

Each of the sixteen tag addresses 220(0)-220(15) stored within WAY0 220is compared, in parallel, by comparators 230(0-230(7), to an extended28-bit tag address 410 that includes the 20-bit tag address 402 as wellas an 8-bit DID 414. Those of skill in the art will appreciate that sucha comparison takes place at all ways in parallel. Comparison resultsfrom each of these comparators are sent to multiplexer 247 that selectsa comparison result in response to line index 404. The selected result(denoted TAG MATCH) is provided to one input of OR gate 242.

In addition, the BDU offset 408 and the 4-bit line index 404 are used toretrieve a validity flag that corresponds to the requested BDU. The4-bit line index 404 is used for selecting a set of BDU validity flagsout of the sixteen sets of WAY0, while the 4-bit BDU offset 408 is usedfor selecting a validity flag out of the selects set of BUD validityflags.

The selection is done by multiple switches 243(1)-243(7) and by outputswitch 241. Each switch out of multiple switches 243(1)-243(7) isconnected to BDU validity flags of a single way and is controlled by BDIoffset 405. Output switch 241 is connected to the output of multiplemultiplexers 241(0)-241(7) and is controlled by line index 404. Theoutput of output switch 241 is connected to a second input of OR gate242 and is referred to as VALID.

OR gate 242 outputs a CACHE_HIT/MISS_A signal. A cache hit occurs ifthere is a match between one of the stored tag addresses and theextended tag address and if the selected BDU is valid.

DFU 170 receives an indication of a cache hit and a cache miss. If bothdata requests resulted in a cache hit the DFU 170 is not required toperform a mandatory fetch. If only one or more of the data requestsresulted in a cache miss the DFU 170 is required to perform one or moremandatory fetch operations.

Fetch bus 126 allows fetching a single BDU per fetch operation. Atypical fetch burst includes four consecutive fetch operations, thus atotal of four BDUs can be retrieved during a single fetch burst.

Typically, memory modules that are adapted to perform fetch burst arepartitioned to fixed sized data unit sets. A fetch burst that includes arequest to receive a certain data unit will amount in a retrieval ofthat set. The order of fetched data units depends upon the specificrequested data set.

Sub-system 100 is configured in a manner that a fetch burst cannot beinterrupted. Thus, if more than a single cache miss occurssimultaneously, there is a great benefit in retrieving more than onemandatory BDU during a single fetch burst. This efficient fetchingscheme can reduce the processor stall period, especially as processor110 is stalled until it receives both mandatory BDUs.

FIG. 5 is a schematic illustration of the structure of data cache module200, according to an embodiment of the invention. Data cache module 200includes a controller, although other configuration can be provided,such a configuration in which the controller is not a part of the datacache module. The data cache module can be connected to one or morecontroller.

The cache module 200 is divided to two groups 200(1) and 200(2). Thefirst group 200(1) includes four memory banks 201(2), 201(4), 201(6) and201(8), each bank including two virtual memory banks (202(1), 202(2)),(202(3), 202(4)), (202(5), 202(6)), and (202(7), 202(8)), respectivelyand a first I/O interface module 204.

The second group 200(2) includes four memory banks 211(2), 211(4),211(6) and 211(8), each bank including two virtual memory banks (212(1),212(2)), (212(3), 212(4)), (212(5), 212(6)), and (212(7), 212(8)),respectively and a second I/O interface module 214.

Each memory bank is arranged as an array that includes sixty-four256-bit wide rows. The addresses of the four memory banks that form eachgroup are interleaved to reduce memory contentions. The addresses ofpairs of virtual memory banks that belong to the same memory bank arenot interleaved.

The first I/O interface module 204 is connected in parallel, by twobuses, to four memory banks 201(2)-201(8) and the second I/O interfacemodule 214 is connected in parallel, by two buses, to memory banks211(2)-211(8).

A data cache module 200, as well as sub-system 100 has a finitecapability of managing simultaneous information transfers. For example,data cache module contention may occur when the module receives twosimultaneous access requests to different addresses within the samevirtual memory bank. The access requests can be a part of read or writeoperations. In such a case one of the access requests is serviced afterthe other. This may cause processor 110 to stall. The finite capabilityis also expressed by the need to arbitrate between various bus requests,as implemented by the DCU 150. It this case the core can also bestalled.

The data cache module 200, and especially the cache logic 210, isconnected to a controller, such as DFU 170, to provide indications aboutcache events. The requests of the DFU 170, as well as requests fromvarious supporting units, such as the WBB 180 to complete write backoperations, and sent to DCU 150 that arbitrates between the variousrequests. These various components exchange fetch request and fetchacknowledgement signals. The CACHE_A_HIT/MISS 201 signal is asserted inresponse to an occurrence of a cache miss event associated with arequest to retrieve data over the first data bus XA 122. This signal isnegated when a corresponding cache hit event occurs. TheCACHE_B_HIT/MISS 203 signal is asserted in response to an occurrence ofa cache miss event associated with a request to retrieve data over thesecond data bass XB 124. This signal is negated when a correspondingcache hit event

Cache module 220 may also include buffering means connected to the firstdata bus XA 122, to the second data bus 124 and/or to the data fetch bus126.

FIG. 6 is a schematic illustration of data channel 130, according to anembodiment of the invention.

Various components of the data channel 130, including cache module 200,WTB 155, DFU 170 and WBB 180 can access a bus that is connected to othermemory modules, such as high-level memory module 50. The requests aresent to DCU 150 that arbitrates between the bus requests. Conveniently,requests to fetch data to data cache 200 are generated by DFU 170 andsent to DCU 150.

DFU 170 is capable of determining a fetching scheme that in turn caninclude mandatory fetch operations as well as speculative fetchoperations. The speculative fetch operations associated with differentmandatory information units can be interlaced, but this is notnecessarily so.

WBB 180 has eight entries of 256-bit each, for storing up to sixteenBDUs at a time. It has an input bus and an output bus.

WBB 180 is adapted to receive information units from the cache module200 and send the information units to the high-level memory module 50.WBB 180 has limited buffering capabilities and is capable of separatingbetween a reception of information units from the cache module 200 andbetween writing the information units to the high-level memory module50. Usually, before new BDUs are written to the cache module 200 thecache module 200 automatically transfers BDUs that have a lowerprobability of being re-read (usually older BDUs). It is noted that aBDU can be cache-locked, meaning that it is not thrashed.

WBB 180 is capable of generating a high-priority bus request and a lowpriority-bus request for sending at least one information unit to thehigh-level memory module 50. High-priority bus requests are generated invarious scenarios, such as a reception of a flush instruction, full oralmost full WBB state, and possible WBB incoherency event. A flushinstruction forces the entire content of the WBB 180 to be sent to thehigh-level memory module 50.

A WBB incoherency event may occur when a processor requests aninformation unit that is stored within WBB 180. This information wasflushed from the cache module 200 thus it can cause a cache miss event.A mandatory fetch operation to retrieve that information unit caneventually send an obsolete information unit to the processor 110.Instead, once WBB 180 detects that such an event can occur it sends itscontent to high-level memory module 50, waits until high-level memorymodule 50 is updated, and allows high-priority memory module 50 to sendthe updated information unit to the processor 110.

The WTB 155 facilitates write through operations. It includes sixentries. It is connected to the first and second data buses XA 122 andXB 124. It also has an output data bus. It is adapted to receive twoentries at a time. It is capable of issuing write through requests ofvarious priorities. Conveniently, the priority of the write throughrequests is higher that the priority of pre-fetch requests. WTB 155 canissue higher priority bus requests when processor 110 is stalled untilthe write through operation is completed.

The processor 110 can execute various coherency related operationsincluding address range invalidation, address range synchronization andaddress range flush. Address range invalidation may involve resettingthe valid and dirty bits associated with the relevant BDUs.

According to an embodiment of the invention processor 110 may define thedata memory policy for each cache memory set of lines. This cache memoryset of lines may correspond to a way but this is not necessarily so. Acache write-back policy is conveniently applied to data that is to bere-used by a program. In such a case multiple write operations to thecache do not necessarily amount in multiple transaction to thehigh-level memory module 50. On the other hand, if there is a lowprobability that certain data segment will be re-used then the writethrough policy can be implemented.

There are various well-known manners to convey the data memory policy.It is assumed that the data memory policy is implemented by processor110 that inserts appropriate values in a certain control register. MMU300 in turn sends control signals that define the manner in which dataunit is written to the data channel 130. Such a control register caninclude two bits that define if the data memory policy is cacheablewrite through, cacheable write back or non-cacheable write through. Inresponse, MMU 300 sends appropriate control signals to the variousbuffers and cache, including WBB 180 and WTB 155. The content of thecertain control register may be varied, according to the cache memoryset of line that is involved.

When applying a cacheable write back policy data that is written to thedata cache module 200 is sent to the high-level memory module 50 onlythrough WBB 180. When applying cacheable write through policy processor110 is not stalled, unless a hazard is detected, and data is writtenboth to the data cache module 200 and to WTB 155. Data is not written tothe data cache module 200 until its corresponding DBU is valid.Processor 110 can be stalled when applying a non-cacheable write throughpolicy. Those of skill in the art will appreciate that other data memorypolicies can be applied, without departing from the scope of theinvention.

DCU 150 arbitrates between various bus requests initiated by variouscomponents of the data channel 130, including the DFU 170, the WTB 155,the TWB 160 and the WBB 180. DCU 150 can apply various well-knownarbitration schemes. Usually, the DCU 150 will arbitrate between variousbus requests according to the following priority: high-priority busrequests from an optional trace buffer within data cache (used for traceoperations); high-priority bus requests from the WBB 180; previousinformation unit bus requests from the WTB 155, mandatory fetch requestsfrom the DFU 170; low-priority bus requests from the WTB 155;speculative fetch requests from the DFU 170 and finally low-priority busrequests from the WBB 180.

DCU 150 can sends acknowledgement messages to a requesting component, ifa request that is sent from that component has won an arbitration cycle.DCU 150 includes an internal request queue and can arbitrate betweenvarious requests according to their priority.

According to an embodiment of the invention when a write through request274 is sent by processor 110, cache module 200 can delay its miss/hitdetermination until receiving an indication that DFU 170 isempty—meaning that if a speculative fetch operation was in progress whenthe write through request was received then this speculative fetchoperation ended and the speculatively fetched information units arestored in cache module 200.

The DFU 170 can be emptied from data by delaying the execution ofspeculative fetch operations that were requested by DFU 170 but were notstarted by DCU 150. The delay process within DFU 170 can be implementedin various manners known in the art. It can include masking low priorityrequests such as speculative fetch requests, freezing the arbitrationprocess, and the like.

According to yet another embodiment of the invention the delaying of thehit/miss decision is responsive to the address of the information unit.If the address belongs to a predefined memory range the delay takesplace. Otherwise, the hit/miss determination is executed without delay.

FIG. 7 is a schematic illustration of A DEVICE 11, according to anotherembodiment of the invention.

Device 11 includes DFU 170, data cache 200 DCU 150 and WTB 155. Aprocessor 111 that has a single data bus (XA 124) is connected to datacache 200 and to WTB 155. data cache 200 is also connected to DFU 170.DFU 170 is connected to DCU 150. DCU 150 is also connected to WTB 155.Data fetch bus 126 is connected to WTB 155 and DFU 170. It is noted thatdevice 11 can implement any of the mentioned below methods. Device 11differs from device 10 by including a single data bus and a single databus processor, as well as including less components. The manner in whichWTB 155, data cache 200, DFU 170 and DCU 150 operate is illustrated inreference to FIG. 1-FIG. 5.

FIG. 8 illustrates method 400 for fetching an information unit accordingto an embodiment of the invention.

Method 400 starts by stage 410 of receiving a request to execute a writethrough cacheable operation of the information unit. Referring to theexample set fourth in previous drawings, processor 110 provides arequest to execute a write through cacheable operation. During thisoperation an information unit that appears on either one of busses XA122 or XB 124 is written to a write through buffer (such as WTB 155) andthen, via data fetch bus 126 to a high-level memory. The write throughcacheable operation also include writing the information unit to cachemodule 200, if cache module 200 stores an older version of theinformation unit.

Stage 410 is followed by stage 440 of emptying a fetch unit from data.The fetch unit can initiate speculative and mandatory fetch operations.This fetch unit is connected to the cache module and to the high-levelmemory unit.

Conveniently, stage 440 of emptying includes completing an execution ofcurrently executed speculative fetch operations and delaying anexecution of requested speculative fetch operations that were notstarted.

Conveniently, stage 440 of emptying includes sending to an arbiter thatcontrols an access to the high level memory unit, a write throughrequest. The write through request has a higher priority than requeststo perform speculative fetch operations. Thus, write through requestsand not speculative fetch operation will be executed, thus allowing toempty the fetch unit.

Stage 440 is followed by stage 450 of determining, when the fetch unitis empty, whether the cache module stores an older version of theinformation unit. This stage includes determining whether theinformation unit provides by the processor will cause a hit or a miss.

Conveniently, stage 450 of determining whether the cache module stores aversion of the information unit is preceded by receiving, by the cachemodule an empty data indication from the fetch unit.

Stage 450 is followed by stage 460 of selectively writing theinformation unit to the cache module in response to the cache module inresponse to the determination. If a cache hit occurs the informationunit provided by the processor will overwrite the older versioninformation unit, else (if a cache miss occurs) the information unitwill not be stored in the cache module.

Conveniently, stage 460 of selectively writing includes writing theinformation unit from a write through buffer to the high level memoryunit.

Stage 460 is followed by optional stage 470 of completing an executionof a delayed fetch operation.

FIG. 9 illustrates method 500 for fetching an information unit accordingto another embodiment of the invention.

Method 500 enables to delay the hit/miss determination only in certaincircumstances. In other circumstances the hit/miss determination is notdelayed but the coherency problem is less significant.

Method 500 differs from method 400 of FIG. 8 by including stages 420 andoptional stage 451.

Stage 420 includes determining whether to empty the fetch unit from databefore determining whether the cache module stores a version of theinformation unit. Stage 420 of determining may include comparing anaddress of the information unit to a predefined range of addresses.

Referring to the example set fourth in previous drawings, predefinedaddress range comparator 244 can compare a received address to apredefined range of addresses and determine whether to stall thehit/miss determination by sending control signals to timing controller242.

If there is a need to delay the hit/miss determination stage 420 isfollowed by stages 440-470.

If there is no need to delay the hit/miss determination then stage 420is followed by stage 451 of determining, without delay and regardless ofan emptiness level of the fetch unit, whether the cache module stores anolder version of the information unit. This stage includes determiningwhether the information unit provides by the processor will cause a hitor a miss.

Stage 451 is followed by stage 460.

Variations, modifications, and other implementations of what isdescribed herein will occur to those of ordinary skill in the artwithout departing from the spirit and the scope of the invention asclaimed. Accordingly, the invention is to be defined not by thepreceding illustrative description but instead by the spirit and scopeof the following claims.

1. A method for fetching an information unit, the method comprises:receiving a request to execute a write through cacheable operation ofthe information unit; emptying a fetch unit from data, wherein the fetchunit is coupled to a cache module and to a high level memory unit;determining, when the fetch unit is empty, whether the cache modulestores an older version of the information unit; and selectively writingthe information unit to the cache module in response to the cache modulein response to the determination.
 2. The method according to claim 1wherein the emptying comprises: completing an execution of currentlyexecuted speculative fetch operations and delaying an execution ofrequested speculative fetch operations that were not started.
 3. Themethod according to claim 1 wherein the emptying comprises: sending toan arbiter that controls an access to the high level memory unit, awrite through request; wherein the write through request has a higherpriority than requests to perform speculative fetch operations.
 4. Themethod according to claim 1 wherein the selective writing also compriseswriting the information unit from a write through buffer to the highlevel memory unit.
 5. The method according to claim 1 wherein thereceiving is followed by determining whether to empty the fetch unitfrom data before determining whether the cache module stores a versionof the information unit.
 6. The method according to claim 5 wherein thedetermining whether to empty the fetch unit comprises comparing anaddress of the information unit to a predefined range of addresses. 7.The method according to claim 1 wherein the determining whether thecache module stores a version of the information unit is preceded byreceiving, by the cache module an empty data indication from the fetchunit.
 8. The method according to claim 1 wherein the selectively writingis followed by completing an execution of a delayed fetch operation. 9.A device adapted to fetch an information unit, the device comprises: aprocessor adapted to generate a request to execute a write throughcacheable operation of the information unit; a cache module adapted todetermine, when a fetch unit is empty, whether the cache module storesan older version of the information unit and to selectively store theinformation unit in response to the determination; and an arbiteradapted to empty the fetch unit; wherein the fetch unit is coupled tothe cache module and to a high level memory.
 10. The device according toclaim 9 wherein the arbiter is adapted to complete an execution ofcurrently executed speculative fetch operations and to delay anexecution of requested speculative fetch operations that were notstarted.
 11. The device according to claim 9 wherein the device isadapted to send to the arbiter a write through request, wherein thewrite through request has a higher priority than requests to performspeculative fetch operations.
 12. The device according to claim 9wherein the device comprises a write through buffer adapted generaterequests to write the information unit to the high level memory unit.13. The device according to claim 9 wherein the device is adapted todetermine whether to empty the fetch unit from data before determiningwhether the cache module stores a version of the information unit. 14.The device according to claim 13 wherein the device is adapted tocompare between an address of the information unit and between apredefined address range and in response determine whether to empty thefetch unit from data before determining whether the cache module storesa version of the information unit.
 15. The device according to claim 9wherein the cache module is adapted to receive an indication that afetch unit is empty and in response to determine whether the cachemodule stores an older version of the information unit.
 16. The deviceaccording to claim 9 wherein the device is adapted to complete anexecution of a delayed fetch operation.
 17. The method according toclaim 2 wherein the emptying comprises: sending to an arbiter thatcontrols an access to the high level memory unit, a write throughrequest; wherein the write through request has a higher priority thanrequests to perform speculative fetch operations.
 18. The methodaccording to claim 2 wherein the selective writing also compriseswriting the information unit from a write through buffer to the highlevel memory unit.
 19. The method according to claim 3 wherein thereceiving is followed by determining whether to empty the fetch unitfrom data before determining whether the cache module stores a versionof the information unit.
 20. The method according to claim 2 wherein thedetermining whether the cache module stores a version of the informationunit is preceded by receiving, by the cache module an empty dataindication from the fetch unit.